25 - Deep Learning - Plain Version 2020 [ID:21121]
50 von 161 angezeigt

Welcome everybody to the next part of deep learning. Today we want to finish talking

about common practices and in particular we want to have a look at the evaluation.

Of course we need to evaluate the performance of our models that we've trained so far.

Now we've set up the training, set hyperparameters and configured all of this.

Now we want to evaluate the generalization performance on previously unseen data.

This means the test data and it's time to open the vault.

Remember of all things the measure is man.

So data is annotated and labeled by humans and during training all labels are assumed

to be correct.

But of course to err is human.

This means that we might have ambiguous data.

The ideal situation that you actually want to have for your data is that it has been

annotated by multiple human voters.

Then you can take a mean or majority vote.

There's also a very nice paper by Stefan Steidl from 2015.

It introduces an entropy based measure that takes into account the confusion of human

reference labelers.

This is very useful in situations where you have unclear labels.

In particular in emotion recognition this is a problem as also humans confuse sometimes

classes like angry versus annoyed while they are not very likely to confuse angry versus

happy.

As this is a very clear distinction.

There are different degrees of happiness.

Sometimes you're just a little bit happy.

In these cases it is really difficult to differentiate happy from neutral.

This is also hard for humans.

In prototypes if you have for example actors playing you get emotion recognition rates

way over 90%.

If you have real data and if you have real emotions as they occur in daily life it's

much harder to predict.

This can then also be seen in the labels and the distribution of the labels.

If you have a prototype all of the raters will agree that the observation is clearly

this particular class.

If you have nuances and not so clear emotions you will also see that our raters will have

a less peaked or even uniform distribution over the labels because they also can't assess

the specific sample.

So mistakes by the classifier are obviously less severe if the same class is also confused

by humans.

Exactly this is considered in Steidl's entropy based measure.

Now if we look into performance measures we want to take into account the typical classification

measures.

They are built around the false negatives, the true negatives, the true positives and

the false positives.

From that for binary classification problems you can compute true and false positive rates.

This typically then leads to numbers like the accuracy that is the number of true positives

plus the true negatives over the number of positives and negatives.

Then there is the position of positive predictive value that is computed as the number of positive

back positive over the number of true positives plus false positive.

There is the so called recall that is defined as the true positives over the true positives

plus the false negatives.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:12:54 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 15:06:36

Sprache

en-US

Deep Learning - Common Practices Part 4

This video discusses how to evaluate deep learning approaches.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

References:
[1] M. Aubreville, M. Krappmann, C. Bertram, et al. “A Guided Spatial Transformer Network for Histology Cell Differentiation”. In: ArXiv e-prints (July 2017). arXiv: 1707.08525 [cs.CV].
[2] James Bergstra and Yoshua Bengio. “Random Search for Hyper-parameter Optimization”. In: J. Mach. Learn. Res. 13 (Feb. 2012), pp. 281–305.
[3] Jean Dickinson Gibbons and Subhabrata Chakraborti. “Nonparametric statistical inference”. In: International encyclopedia of statistical science. Springer, 2011, pp. 977–979.
[4] Yoshua Bengio. “Practical recommendations for gradient-based training of deep architectures”. In: Neural networks: Tricks of the trade. Springer, 2012, pp. 437–478.
[5] Chiyuan Zhang, Samy Bengio, Moritz Hardt, et al. “Understanding deep learning requires rethinking generalization”. In: arXiv preprint arXiv:1611.03530 (2016).
[6] Boris T Polyak and Anatoli B Juditsky. “Acceleration of stochastic approximation by averaging”. In: SIAM Journal on Control and Optimization 30.4 (1992), pp. 838–855.
[7] Prajit Ramachandran, Barret Zoph, and Quoc V. Le. “Searching for Activation Functions”. In: CoRR abs/1710.05941 (2017). arXiv: 1710.05941.
[8] Stefan Steidl, Michael Levit, Anton Batliner, et al. “Of All Things the Measure is Man: Automatic Classification of Emotions and Inter-labeler Consistency”. In: Proc. of ICASSP. IEEE - Institute of Electrical and Electronics Engineers, Mar. 2005.

Einbetten
Wordpress FAU Plugin
iFrame
Teilen